dikin walk
Regularized Dikin Walks for Sampling Truncated Logconcave Measures, Mixed Isoperimetry and Beyond Worst-Case Analysis
We study the problem of drawing samples from a logconcave distribution truncated on a polytope, motivated by computational challenges in Bayesian statistical models with indicator variables, such as probit regression. Building on interior point methods and the Dikin walk for sampling from uniform distributions, we analyze the mixing time of regularized Dikin walks. Our contributions are threefold. First, for a logconcave and log-smooth distribution with condition number $\kappa$, truncated on a polytope in $\mathbb{R}^n$ defined with $m$ linear constraints, we prove that the soft-threshold Dikin walk mixes in $\widetilde{O}((m+\kappa)n)$ iterations from a warm initialization. It improves upon prior work which required the polytope to be bounded and involved a bound dependent on the radius of the bounded region. Moreover, we introduce the regularized Dikin walk using Lewis weights for approximating the John ellipsoid. We show that it mixes in $\widetilde{O}((n^{2.5}+\kappa n)$. Second, we extend the mixing time guarantees mentioned above to weakly log-concave distributions truncated on polytopes, provided that they have a finite covariance matrix. Third, going beyond worst-case mixing time analysis, we demonstrate that soft-threshold Dikin walk can mix significantly faster when only a limited number of constraints intersect the high-probability mass of the distribution, improving the $\widetilde{O}((m+\kappa)n)$ upper bound to $\widetilde{O}(m + \kappa n)$. Additionally, per-iteration complexity of regularized Dikin walk and ways to generate a warm initialization are discussed to facilitate practical implementation.
Faster Sampling from Log-Concave Densities over Polytopes via Efficient Linear Solvers
Mangoubi, Oren, Vishnoi, Nisheeth K.
We present a nearly-optimal implementation of this Markov chain with per-step complexity which is roughly the number of non-zero entries of A while the number of Markov chain steps remains the same. The key technical ingredients are 1) to show that the matrices that arise in this Dikin walk change slowly, 2) to deploy efficient linear solvers that can leverage this slow change to speed up matrix inversion by using information computed in previous steps, and 3) to speed up the computation of the determinantal term in the Metropolis filter step via a randomized Taylor series-based estimator. This result directly improves the runtime for applications that involve sampling from Gibbs distributions constrained to polytopes that arise in Bayesian statistics and private optimization.
Gaussian Cooling and Dikin Walks: The Interior-Point Method for Logconcave Sampling
Kook, Yunbum, Vempala, Santosh S.
The connections between (convex) optimization and (logconcave) sampling have been considerably enriched in the past decade with many conceptual and mathematical analogies. For instance, the Langevin algorithm can be viewed as a sampling analogue of gradient descent and has condition-number-dependent guarantees on its performance. In the early 1990s, Nesterov and Nemirovski developed the Interior-Point Method (IPM) for convex optimization based on self-concordant barriers, providing efficient algorithms for structured convex optimization, often faster than the general method. This raises the following question: can we develop an analogous IPM for structured sampling problems? In 2012, Kannan and Narayanan proposed the Dikin walk for uniformly sampling polytopes, and an improved analysis was given in 2020 by Laddha-Lee-Vempala. The Dikin walk uses a local metric defined by a self-concordant barrier for linear constraints. Here we generalize this approach by developing and adapting IPM machinery together with the Dikin walk for poly-time sampling algorithms. Our IPM-based sampling framework provides an efficient warm start and goes beyond uniform distributions and linear constraints. We illustrate the approach on important special cases, in particular giving the fastest algorithms to sample uniform, exponential, or Gaussian distributions on a truncated PSD cone. The framework is general and can be applied to other sampling algorithms.
Sampling from Log-Concave Distributions over Polytopes via a Soft-Threshold Dikin Walk
Mangoubi, Oren, Vishnoi, Nisheeth K.
Given a Lipschitz or smooth convex function $\, f:K \to \mathbb{R}$ for a bounded polytope $K \subseteq \mathbb{R}^d$ defined by $m$ inequalities, we consider the problem of sampling from the log-concave distribution $\pi(\theta) \propto e^{-f(\theta)}$ constrained to $K$. Interest in this problem derives from its applications to Bayesian inference and differentially private learning. Our main result is a generalization of the Dikin walk Markov chain to this setting that requires at most $O((md + d L^2 R^2) \times md^{\omega-1}) \log(\frac{w}{\delta}))$ arithmetic operations to sample from $\pi$ within error $\delta>0$ in the total variation distance from a $w$-warm start. Here $L$ is the Lipschitz-constant of $f$, $K$ is contained in a ball of radius $R$ and contains a ball of smaller radius $r$, and $\omega$ is the matrix-multiplication constant. Our algorithm improves on the running time of prior works for a range of parameter settings important for the aforementioned learning applications. Technically, we depart from previous Dikin walks by adding a "soft-threshold" regularizer derived from the Lipschitz or smoothness properties of $f$ to the log-barrier function for $K$ that allows our version of the Dikin walk to propose updates that have a high Metropolis acceptance ratio for $f$, while at the same time remaining inside the polytope $K$.
Sampling from Log-Concave Distributions with Infinity-Distance Guarantees and Applications to Differentially Private Optimization
Mangoubi, Oren, Vishnoi, Nisheeth K.
For a $d$-dimensional log-concave distribution $\pi(\theta)\propto e^{-f(\theta)}$ on a polytope $K$, we consider the problem of outputting samples from a distribution $\nu$ which is $O(\varepsilon)$-close in infinity-distance $\sup_{\theta\in K}|\log\frac{\nu(\theta)}{\pi(\theta)}|$ to $\pi$. Such samplers with infinity-distance guarantees are specifically desired for differentially private optimization as traditional sampling algorithms which come with total-variation distance or KL divergence bounds are insufficient to guarantee differential privacy. Our main result is an algorithm that outputs a point from a distribution $O(\varepsilon)$-close to $\pi$ in infinity-distance and requires $O((md+dL^2R^2)\times(LR+d\log(\frac{Rd+LRd}{\varepsilon r}))\times md^{\omega-1})$ arithmetic operations, where $f$ is $L$-Lipschitz, $K$ is defined by $m$ inequalities, is contained in a ball of radius $R$ and contains a ball of smaller radius $r$, and $\omega$ is the matrix-multiplication constant. In particular this runtime is logarithmic in $\frac{1}{\varepsilon}$ and significantly improves on prior works. Technically, we depart from the prior works that construct Markov chains on a $\frac{1}{\varepsilon^2}$-discretization of $K$ to achieve a sample with $O(\varepsilon)$ infinity-distance error, and present a method to convert continuous samples from $K$ with total-variation bounds to samples with infinity bounds. To achieve improved dependence on $d$, we present a "soft-threshold" version of the Dikin walk which may be of independent interest. Plugging our algorithm into the framework of the exponential mechanism yields similar improvements in the running time of $\varepsilon$-pure differentially private algorithms for optimization problems such as empirical risk minimization of Lipschitz-convex functions and low-rank approximation, while still achieving the tightest known utility bounds.
Fast MCMC sampling algorithms on polytopes
Chen, Yuansi, Dwivedi, Raaz, Wainwright, Martin J., Yu, Bin
Sampling from distributions is a core problem in statistics, probability, operations research, and other areas involving stochastic models [Gem84; Bré91; Rip87; Has70]. Sampling algorithms are a prerequisite for applying Monte Carlo methods to order to approximate expectations and other integrals. Recent decades have witnessed great success of Markov Chain Monte Carlo (MCMC) algorithms; for instance, see the handbook [Bro11] and references therein. These methods are based on constructing a Markov chain whose stationary distribution is equal to the target distribution, and then drawing samples by simulating the chain for a certain number of steps. An advantage of MCMC algorithms is that they only require knowledge of the target density up to a proportionality constant. However, the theoretical understanding of MCMC algorithms used in practice is far from complete. In particular, a general challenge is to bound the mixing time of a given MCMC algorithm, meaning the number of iterations--as a function of the error tolerance δ, problem dimension d and other parameters--for the chain to arrive at a distribution within distance δ of the target. In this paper, we study a certain class of MCMC algorithms designed for the problem of drawing samples from the uniform distribution over a polytope.